Most of voice activity detection (VAD) schemes are operated in the discrete Fourier transform (DFT) domain by\r\nclassifying each sound frame into speech or noise based on the DFT coefficients. These coefficients are used as\r\nfeatures in VAD, and thus the robustness of these features has an important effect on the performance of VAD\r\nscheme. However, some shortcomings of modeling a signal in the DFT domain can easily degrade the\r\nperformance of a VAD in a noise environment. Instead of using the DFT coefficients in VAD, this article presents a\r\nnovel approach by using the complex coefficients derived from complex exponential atomic decomposition of a\r\nsignal. With the goodness-of-fit test, we show that those coefficients are suitable to be modeled by a Gaussian\r\nprobability distribution. A statistical model is employed to derive the decision rule from the likelihood ratio test.\r\nAccording to the experimental results, the proposed VAD method shows better performance than the VAD based\r\non the DFT coefficients in various noise environments.
Loading....